Selecting Representative Data Sets
نویسندگان
چکیده
A training set is a special set of labeled data providing known information that is used in the supervised learning to build a classification or regression model. We can imagine each training instance as a feature vector together with an appropriate output value (label, class identifier). A supervised learning algorithm deduces a classification or regression function from the given training set. The deduced classification or regression function should predict an appropriate output value for any input vector. The goal of the training phase is to estimate parameters of a model to predict output values with a good predictive performance in real use of the model.
منابع مشابه
Selecting Representative Images from Larger Photo Sets
Selecting a single photograph to represent a set of photographs is a useful approach when creating interfaces to large photo collections. Unfortunately, many different methods for selecting photographs are used in practice today, and none have been well studied or shown to be particularly effective. In this paper we look at several different common methods for selecting a single photograph in o...
متن کاملA Framework for Mining High Dimensional Data for Feature Subset Selection
Features are representative characteristics of data sets. Identifying such fetures in a high dimensional dataset play an important role in real world applications. Data mining is best used to determine important features. Selecting important features from a subject of identified features can help in making expert decisions. However, efficient identification of such feature subset and selection ...
متن کاملDensity-Based Multiscale Data Condensation
ÐA problem gaining interest in pattern recognition applied to data mining is that of selecting a small representative subset from a very large data set. In this article, a nonparametric data reduction scheme is suggested. It attempts to represent the density underlying the data. The algorithm selects representative points in a multiscale fashion which is novel from existing density-based approa...
متن کاملShapes of Features and a Modified Measure for Linear Discriminant Analysis
In this paper, the problem of selecting most representative features among a feature set is considered. Two new feature selection algorithms are introduced and their performances are compared with some well-known feature selection algorithms. The algorithms are tested with the iris data set, three artificially generated data sets and a data set obtained from steel surfaces.
متن کاملSelecting a representative training set for the classi®cation of demolition waste using remote NIR sensing
In the AUTOSORT project, the goal is the separation of demolition waste in three fractions: wood, plastics and stone. A remote near-infrared sensor measures reduced re ̄ectance spectra (mini-spectra) of objects. Linear discriminant analysis (LDA) is used for the classi®cation of these spectra. To obtain the LDA model, a representative training set is needed. New LDA-models will be regularly need...
متن کاملSummarizing Sets of Categorical Sequences - Selecting and Visualizing Representative Sequences
This paper is concerned with the summarization of a set of categorical sequence data. More specifically, the problem studied is the determination of the smallest possible number of representative sequences that ensure a given coverage of the whole set, i.e. that have together a given percentage of sequences in their neighborhood. The goal is to yield a representative set that exhibits the key f...
متن کامل